bayroot: A Bayesian phylogenetic approach to
dating HIV reservoir sequences
Roux-Cil Ferreira and Art Poon
Department of Pathology and Laboratory Medicine,
Western University, Canada
The latent HIV reservoir
- HIV provirus in the latent reservoir may come from an early or late
stage of infection.
- Adaptation to host-specific immune response, drug resistance.
- Is long-term persistence of the reservoir due to clonal expansion or
ongoing replication?
- Replication in anatomic compartments with low drug penetrance would
replenish a “young” reservoir.
Dating HIV integration events
-
Replication-competent provirus from the latent reservoir can be
sequenced from viral outgrowth experiments.
-
One approach is to build a phylogeny and map outgrowth sequences to
dated plasma HIV RNA.
-
A tree is an incomplete sample of lineages, uncertainty in topology and
location of root.
|
Image
source: Abrahams et al. (2019) The replication-competent HIV-1 latent
reservoir is primarily established near the time of therapy initiation.
Science Trans Med 11(513).
|
Root-to-tip regression
-
Under a strict molecular clock, the expected number of mutations
increases linearly with time.
-
A root-to-tip
(RTT) method regresses divergence from the root (\(y\)-axis) on sample collection dates (\(x\)-axis).
-
Fit regression to HIV RNA only, use model to predict integration dates.
-
May be more robust to uncertainty in tree.
|
Image source:
Buonagurio DA, et al.. Evolution of human influenza A viruses
over 50 years: rapid, uniform rate of change in NS gene. Science. 1986
May 23;232(4753):980-2.
|
Problems with RTT dating
-
Ignores variation in number of mutations — there is one, and only one,
estimated date.
-
Unclear how to deal with divergent outgrowth sequences.
-
Proviral sequences can be mapped to the future (right, red zone).
-
Assumes location of root and clock rate are known without error.
|
|
Poisson likelihood
- A strict clock assumes that mutations accumulate at a constant rate
\(\mu\) over time, so variation is
equal to the mean (Poisson).
- Let \(Y_i\) be the number of
mutations in the \(i^{\textrm{th}}\)
sequence, given location of root.
- Let \(\Delta t_i\) be the time
elapsed between the \(i^{\textrm{th}}\)
sample and the root.
- The log-likelihood for RNA set \(\{Y_i,
\Delta t_i\}\) is:
\[\log L(Y_i, \Delta t_i) = \sum_i
Y_i\log(\mu \Delta t_i) - \mu \Delta t_i - \log
\Gamma(Y_i+1)\]
Metropolis-Hastings sampling
| Root |
Uniform |
\(\textrm{Unif}(-\delta,
+\delta)\), reflection on tips and random choice of branches at
splits. |
| Clock rate |
Lognormal |
\(\textrm{Unif}(-\delta, +\delta)\)
proposal reflecting on zero. |
| Origin date |
Uniform |
Truncated normal proposal with mean 0 and variance \(\sigma\). |
Sampling integration dates
Using Bayes’ rule, the probability of integration time \(t_i\) for outgrowth sequence with
divergence \(Y_i\) is: \[P(t_i|Y_i) = \frac{P(Y_i|t_i)
P(t_i)}{P(Y_i)}\]
We assume \(P(t_i) =
\frac{1}{T-t_0}\), where \(t_0\)
is the origin date and \(T\) is the
maximum integration date, and letting \(s=t-t_0\):
\[
P(Y_i) = \frac{\int_0^{T-t_0} (\mu s)^{Y_i} \exp(-\mu s)
\mathrm{d}s}{(T-t_0)\Gamma(Y_i+1)}
= \frac{\gamma(Y_i+1, \mu(T-t_0))}{\mu(T-t_0)\Gamma(Y_i+1)}
\]
where \(\gamma(s, x)\) is the lower
incomplete gamma function.
Sampling integration dates (2)
Letting \(\Lambda=\mu(T-t_0)\),
the probability of \(t_i\) given \(y_i\) mutations simplifies to: \[
P(t_i | Y_i) = \frac{\mu \Lambda^{y_i}\exp(-\Lambda)}{\gamma(Y_i+1,
\Lambda)}
\]
Since we can’t solve for the inverse CDF, we use a simple
rejection method to sample integration times.
- \(t_0\), \(y_i\) and \(\mu\) are sampled from the posterior
distribution.
Simulation model
- We used an exact stochastic simulation module in R
(
twt) to simulate cell population dynamics (forward
time).
- Discrete events with exponential waiting times determine population
dynamics over time.

Simulating population dynamics
-
Branching events require a “source” cell to induce a “target” cell to
change state.
-
Transmission of virus from an infected cell to a susceptible cell.
-
Virus replication completely blocked by ART initiation at time \(t=10\).
-
Transition events occur when a cell switches from one state
(compartment) to another.
-
e.g., reactivation of a latently-infected cell.
|
|
Simulating trees
-
From the population size trajectories,
twt samples trees in
reverse time for given sampling times and cell types.
-
▲ = branching event (infection from active cell, or clonal expansion of
latent cell)
-
○ = state transition (from active to latent or vice versa)
-
Retaining these tree annotations lets us collapse branches associated
with latently-infected cells.
|
|
Simulating evolution
- Ran trees from
twt through INDELible
- TN93 nucleotide substitution model with 40%
As.
- Seeded with an HIV-1 subtype C sequence at the root (AY772699)
- Reconstructed trees from simulated alignments with FastTree2 (double
precision build).
Comparing bayroot to root-to-tip regression
When clock signal is strong, advantage of bayroot is
largely due to prior information about ART initiation.
Integration date estimates for one replicate simulation.
|
Paired
Wilcoxon sign-rank test, \(P=3.55\times
10^{-4}\), \(n=50\).
|
… and with greater uncertainty
Integration date estimates for one replicate simulation.
|
Paired Wilcoxon sign-rank test, \(P=3.82\times
10^{-7}\), \(n=50\).
|
Results with a narrow prior on root date; with a less informative
prior, bayroot results become similar to RTT.
Application to real data
ZM1044M, Zambia-Emory HIV Research Project
Brooks et al., 2020, PLOS Pathog 16(6): e1008378.
Further work
- A simple and efficient means of incorporating prior knowledge and
managing uncertainty in dating the HIV reservoir.
- Characterize sensitivity of
bayroot to varying model
parameters, e.g., transition of active infected T-cell to
resting state.
- R package available from
https://github.com/PoonLab/bayroot
This study was supported by funding from CIHR and NIH (REACH
AI164565-01). RC was supported by an Ontario Genomics-CANSSI Fellowship
in Genome Data Science.